NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations

https://doi.org/10.1038/s41467-024-49780-2

Diaz, Daniel J; Gong, Chengyue; Ouyang-Zhang, Jeffrey; Loy, James M; Wells, Jordan; Yang, David; Ellington, Andrew D; Dimakis, Alexandros G; Klivans, Adam R (December 2024, Nature Communications)

Abstract Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
more » « less
Full Text Available
Engineering a photoenzyme to use red light

https://doi.org/10.1016/j.chempr.2024.09.017

Carceller, Jose M; Jayee, Bhumika; Page, Claire G; Oblinsky, Daniel G; Mondragón-Solórzano, Gustavo; Chintala, Nithin; Cao, Jingzhe; Alassad, Zayed; Zhang, Zheyu; White, Nathaniel; et al (February 2025, Chem)

Free, publicly-accessible full text available February 1, 2026
Asymmetric Synthesis of α-Chloroamides via Photoenzymatic Hydroalkylation of Olefins

https://doi.org/10.1021/jacs.4c00927

Liu, Yi; Bender, Sophie G; Sorigue, Damien; Diaz, Daniel J; Ellington, Andrew D; Mann, Greg; Allmendinger, Simon; Hyster, Todd K (March 2024, Journal of the American Chemical Society)

Full Text Available
Parallel molecular computation on digital data stored in DNA

https://doi.org/10.1073/pnas.2217330120

Wang, Boya; Wang, Siyuan Stella; Chalk, Cameron; Ellington, Andrew D; Soloveichik, David (September 2023, Proceedings of the National Academy of Sciences)

DNA is an incredibly dense storage medium for digital data. However, computing on the stored information is expensive and slow, requiring rounds of sequencing, in silico computation, and DNA synthesis. Prior work on accessing and modifying data using DNA hybridization or enzymatic reactions had limited computation capabilities. Inspired by the computational power of “DNA strand displacement,” we augment DNA storage with “in-memory” molecular computation using strand displacement reactions to algorithmically modify data in a parallel manner. We show programs for binary counting and Turing universal cellular automaton Rule 110, the latter of which is, in principle, capable of implementing any computer algorithm. Information is stored in the nicks of DNA, and a secondary sequence-level encoding allows high-throughput sequencing-based readout. We conducted multiple rounds of computation on 4-bit data registers, as well as random access of data (selective access and erasure). We demonstrate that large strand displacement cascades with 244 distinct strand exchanges (sequential and in parallel) can use naturally occurring DNA sequence from M13 bacteriophage without stringent sequence design, which has the potential to improve the scale of computation and decrease cost. Our work merges DNA storage and DNA computing, setting the foundation of entirely molecular algorithms for parallel manipulation of digital information preserved in DNA.<
more » « less
Full Text Available
Graphene Field Effect Biosensor for Concurrent and Specific Detection of SARS-CoV-2 and Influenza

https://doi.org/10.1021/acsnano.3c07707

Kumar, Neelotpala; Towers, Dalton; Myers, Samantha; Galvin, Cooper; Kireev, Dmitry; Ellington, Andrew D; Akinwande, Deji (September 2023, ACS Nano)

Full Text Available
Integrated Top-Down and Bottom-Up Mass Spectrometry for Characterization of Diselenide Bridging Patterns of Synthetic Selenoproteins

https://doi.org/10.1021/acs.analchem.2c01433

Watts, Eleanor; Thyer, Ross; Ellington, Andrew D.; Brodbelt, Jennifer S. (August 2022, Analytical Chemistry)

Full Text Available
HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing

Chen, Tianlong; Gong, Chengyue; Diaz, Daniel J; Chen, Xuxi; Wells, Jordan T; Liu, Qiang; Wang, Zhangyang; Ellington, Andrew D; Dimakis, Alexandros G; Klivans, Adam (February 2023, ICLR 2023 https://openreview.net/forum?id=YDJRFWBMNby)

The molecular basis of protein thermal stability is only partially understood and has major significance for drug and vaccine discovery. The lack of datasets and standardized benchmarks considerably limits learning-based discovery methods. We present \texttt{HotProtein}, a large-scale protein dataset with \textit{growth temperature} annotations of thermostability, containing K amino acid sequences and K folded structures from different species with a wide temperature range. Due to functional domain differences and data scarcity within each species, existing methods fail to generalize well on our dataset. We address this problem through a novel learning framework, consisting of () Protein structure-aware pre-training (SAP) which leverages 3D information to enhance sequence-based pre-training; () Factorized sparse tuning (FST) that utilizes low-rank and sparse priors as an implicit regularization, together with feature augmentations. Extensive empirical studies demonstrate that our framework improves thermostability prediction compared to other deep learning models. Finally, we introduce a novel editing algorithm to efficiently generate positive amino acid mutations that improve thermostability. Codes are available in https://github.com/VITA-Group/HotProtein.
more » « less
Full Text Available
Ribosomal incorporation of cyclic β-amino acids into peptides using in vitro translation

https://doi.org/10.1039/D0CC02121K

Lee, Joongoo; Torres, Rafael; Kim, Do Soon; Byrom, Michelle; Ellington, Andrew D.; Jewett, Michael C. (May 2020, Chemical Communications)

We demonstrate in vitro incorporation of cyclic β-amino acids into peptides by the ribosome through genetic code reprogramming. Further, we show that incorporation efficiency can be increased through the addition of elongation factor P.
more » « less
Full Text Available
Making Security Viral: Shifting Engineering Biology Culture and Publishing

https://doi.org/10.1021/acssynbio.1c00324

Mackelprang, Rebecca; Adamala, Katarzyna P.; Aurand, Emily R.; Diggans, James C.; Ellington, Andrew D.; Evans, Samuel Weiss; Fortman, J. L.; Hillson, Nathan J.; Hinman, Albert W.; Isaacs, Farren J.; et al (February 2022, ACS Synthetic Biology)

Full Text Available
Effective design principles for leakless strand displacement systems

https://doi.org/10.1073/pnas.1806859115

Wang, Boya; Thachuk, Chris; Ellington, Andrew D.; Winfree, Erik; Soloveichik, David (December 2018, Proceedings of the National Academy of Sciences)

Artificially designed molecular systems with programmable behaviors have become a valuable tool in chemistry, biology, material science, and medicine. Although information processing in biological regulatory pathways is remarkably robust to error, it remains a challenge to design molecular systems that are similarly robust. With functionality determined entirely by secondary structure of DNA, strand displacement has emerged as a uniquely versatile building block for cell-free biochemical networks. Here, we experimentally investigate a design principle to reduce undesired triggering in the absence of input (leak), a side reaction that critically reduces sensitivity and disrupts the behavior of strand displacement cascades. Inspired by error correction methods exploiting redundancy in electrical engineering, we ensure a higher-energy penalty to leak via logical redundancy. Our design strategy is, in principle, capable of reducing leak to arbitrarily low levels, and we experimentally test two levels of leak reduction for a core “translator” component that converts a signal of one sequence into that of another. We show that the leak was not measurable in the high-redundancy scheme, even for concentrations that are up to 100 times larger than typical. Beyond a single translator, we constructed a fast and low-leak translator cascade of nine strand displacement steps and a logic OR gate circuit consisting of 10 translators, showing that our design principle can be used to effectively reduce leak in more complex chemical systems.
more » « less
Full Text Available

« Prev Next »

Search for: All records